{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fine-tune a PyTorch Lightning Text Classifier with Ray Data\n",
    "\n",
    "<a id=\"try-anyscale-quickstart-lightning_cola_advanced\" href=\"https://console.anyscale.com/register/ha?render_flow=ray&utm_source=ray_docs&utm_medium=docs&utm_campaign=lightning_cola_advanced\">\n",
    "    <img src=\"../../../_static/img/run-on-anyscale.svg\" alt=\"try-anyscale-quickstart\">\n",
    "</a>\n",
    "<br></br>\n",
    "\n",
    ":::{note}\n",
    "\n",
    "This is an intermediate example demonstrates how to use [Ray Data](https://docs.ray.io/en/latest/data/data.html) with PyTorch Lightning in Ray Train.\n",
    "\n",
    "If you just want to quickly convert your existing PyTorch Lightning scripts into Ray Train, you can refer to the [Lightning Quick Start Guide](train-pytorch-lightning).\n",
    "\n",
    ":::\n",
    "\n",
    "This demo introduces how to fine-tune a text classifier on the [CoLA(The Corpus of Linguistic Acceptability)](https://nyu-mll.github.io/CoLA/) dataset using a pre-trained BERT model. In particular, it follows three steps:\n",
    "- Preprocess the CoLA dataset with Ray Data.\n",
    "- Define a training function with PyTorch Lightning.\n",
    "- Launch distributed training with Ray Train's TorchTrainer."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the following line in order to install all the necessary dependencies:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "SMOKE_TEST = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install numpy datasets \"transformers>=4.19.1\" \"pytorch_lightning>=1.6.5\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start by importing the needed libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-08-14 16:45:51.059256: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA\n",
      "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2023-08-14 16:45:51.198481: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "2023-08-14 16:45:52.005931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "2023-08-14 16:45:52.006010: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "2023-08-14 16:45:52.006015: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n"
     ]
    }
   ],
   "source": [
    "import ray\n",
    "import torch\n",
    "import numpy as np\n",
    "import pytorch_lightning as pl\n",
    "import torch.nn.functional as F\n",
    "from torch.utils.data import DataLoader, random_split\n",
    "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
    "from datasets import load_dataset, load_metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-process CoLA Dataset\n",
    "\n",
    "CoLA is a dataset for binary sentence classification with 10.6K training examples. First, download the dataset and metrics using the Hugging Face datasets API, and create a Ray Dataset for each split accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = load_dataset(\"glue\", \"cola\")\n",
    "\n",
    "train_dataset = ray.data.from_huggingface(dataset[\"train\"])\n",
    "validation_dataset = ray.data.from_huggingface(dataset[\"validation\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, tokenize the input sentences and pad the ID sequence to length 128 using the `bert-base-uncased` tokenizer. The {meth}`map_batches <ray.data.Dataset.map_batches>` applies this preprocessing function on all data samples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased\")\n",
    "\n",
    "def tokenize_sentence(batch):\n",
    "    outputs = tokenizer(\n",
    "        batch[\"sentence\"].tolist(),\n",
    "        max_length=128,\n",
    "        truncation=True,\n",
    "        padding=\"max_length\",\n",
    "        return_tensors=\"np\",\n",
    "    )\n",
    "    outputs[\"label\"] = batch[\"label\"]\n",
    "    return outputs\n",
    "\n",
    "train_dataset = train_dataset.map_batches(tokenize_sentence, batch_format=\"numpy\")\n",
    "validation_dataset = validation_dataset.map_batches(tokenize_sentence, batch_format=\"numpy\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a PyTorch Lightning model\n",
    "\n",
    "You don't have to make any changes to your `LightningModule` definition. Just copy and paste your code here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SentimentModel(pl.LightningModule):\n",
    "    def __init__(self, lr=2e-5, eps=1e-8):\n",
    "        super().__init__()\n",
    "        self.lr = lr\n",
    "        self.eps = eps\n",
    "        self.num_classes = 2\n",
    "        self.model = AutoModelForSequenceClassification.from_pretrained(\n",
    "            \"bert-base-cased\", num_labels=self.num_classes\n",
    "        )\n",
    "        self.metric = load_metric(\"glue\", \"cola\")\n",
    "        self.predictions = []\n",
    "        self.references = []\n",
    "\n",
    "    def forward(self, batch):\n",
    "        input_ids, attention_mask = batch[\"input_ids\"], batch[\"attention_mask\"]\n",
    "        outputs = self.model(input_ids, attention_mask=attention_mask)\n",
    "        logits = outputs.logits\n",
    "        return logits\n",
    "\n",
    "    def training_step(self, batch, batch_idx):\n",
    "        labels = batch[\"label\"]\n",
    "        logits = self.forward(batch)\n",
    "        loss = F.cross_entropy(logits.view(-1, self.num_classes), labels)\n",
    "        self.log(\"train_loss\", loss)\n",
    "        return loss\n",
    "\n",
    "    def validation_step(self, batch, batch_idx):\n",
    "        labels = batch[\"label\"]\n",
    "        logits = self.forward(batch)\n",
    "        preds = torch.argmax(logits, dim=1)\n",
    "        self.predictions.append(preds)\n",
    "        self.references.append(labels)\n",
    "\n",
    "    def on_validation_epoch_end(self):\n",
    "        predictions = torch.concat(self.predictions).view(-1)\n",
    "        references = torch.concat(self.references).view(-1)\n",
    "        matthews_correlation = self.metric.compute(\n",
    "            predictions=predictions, references=references\n",
    "        )\n",
    "\n",
    "        # self.metric.compute() returns a dictionary:\n",
    "        # e.g. {\"matthews_correlation\": 0.53}\n",
    "        self.log_dict(matthews_correlation, sync_dist=True)\n",
    "        self.predictions.clear()\n",
    "        self.references.clear()\n",
    "\n",
    "    def configure_optimizers(self):\n",
    "        return torch.optim.AdamW(self.parameters(), lr=self.lr, eps=self.eps)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a training function\n",
    "\n",
    "Define a [training function](train-overview-training-function) that includes all of your lightning training logic. {class}`TorchTrainer <ray.train.torch.TorchTrainer>` launches this function on each worker in parallel. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ray.train\n",
    "from ray.train.lightning import (\n",
    "    prepare_trainer,\n",
    "    RayDDPStrategy,\n",
    "    RayLightningEnvironment,\n",
    "    RayTrainReportCallback,\n",
    ")\n",
    "\n",
    "train_func_config = {\n",
    "    \"lr\": 1e-5,\n",
    "    \"eps\": 1e-8,\n",
    "    \"batch_size\": 16,\n",
    "    \"max_epochs\": 5,\n",
    "}\n",
    "\n",
    "def train_func(config):\n",
    "    # Unpack the input configs passed from `TorchTrainer(train_loop_config)`\n",
    "    lr = config[\"lr\"]\n",
    "    eps = config[\"eps\"]\n",
    "    batch_size = config[\"batch_size\"]\n",
    "    max_epochs = config[\"max_epochs\"]\n",
    "\n",
    "    # Fetch the Dataset shards\n",
    "    train_ds = ray.train.get_dataset_shard(\"train\")\n",
    "    val_ds = ray.train.get_dataset_shard(\"validation\")\n",
    "\n",
    "    # Create a dataloader for Ray Datasets\n",
    "    train_ds_loader = train_ds.iter_torch_batches(batch_size=batch_size)\n",
    "    val_ds_loader = val_ds.iter_torch_batches(batch_size=batch_size)\n",
    "\n",
    "    # Model\n",
    "    model = SentimentModel(lr=lr, eps=eps)\n",
    "\n",
    "    trainer = pl.Trainer(\n",
    "        max_epochs=max_epochs,\n",
    "        accelerator=\"auto\",\n",
    "        devices=\"auto\",\n",
    "        strategy=RayDDPStrategy(),\n",
    "        plugins=[RayLightningEnvironment()],\n",
    "        callbacks=[RayTrainReportCallback()],\n",
    "        enable_progress_bar=False,\n",
    "    )\n",
    "\n",
    "    trainer = prepare_trainer(trainer)\n",
    "\n",
    "    trainer.fit(model, train_dataloaders=train_ds_loader, val_dataloaders=val_ds_loader)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To enable distributed training with Ray Train, configure the Lightning Trainer with the following utilities:\n",
    "\n",
    "- {class}`~ray.train.lightning.RayDDPStrategy`\n",
    "- {class}`~ray.train.lightning.RayLightningEnvironment`\n",
    "- {class}`~ray.train.lightning.RayTrainReportCallback`\n",
    "\n",
    "\n",
    "To ingest Ray Data with Lightning Trainer, follow these three steps:\n",
    "\n",
    "- Feed the full Ray dataset to Ray `TorchTrainer` (details in the next section).\n",
    "- Use {meth}`ray.train.get_dataset_shard <ray.train.get_dataset_shard>` to fetch the sharded dataset on each worker.\n",
    "- Use {meth}`ds.iter_torch_batches <ray.data.Dataset.iter_torch_batches>` to create a Ray data loader for Lightning Trainer.\n",
    "\n",
    ":::{seealso}\n",
    "\n",
    "- {ref}`Lightning Quick Start Guide <train-pytorch-lightning>`\n",
    "- {ref}`User Guides for Ray Data <data-ingest-torch>`\n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "if SMOKE_TEST:\n",
    "    train_func_config[\"max_epochs\"] = 2\n",
    "    train_dataset = train_dataset.random_sample(0.1)\n",
    "    validation_dataset = validation_dataset.random_sample(0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed training with Ray TorchTrainer\n",
    "\n",
    "Next, define a {class}`TorchTrainer <ray.train.torch.TorchTrainer>` to launch your training function on 4 GPU workers. \n",
    "\n",
    "You can pass the full Ray dataset to the `datasets` argument of ``TorchTrainer``. TorchTrainer automatically shards the datasets among multiple workers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ray.train.torch import TorchTrainer\n",
    "from ray.train import RunConfig, ScalingConfig, CheckpointConfig, DataConfig\n",
    "\n",
    "\n",
    "# Save the top-2 checkpoints according to the evaluation metric\n",
    "# The checkpoints and metrics are reported by `RayTrainReportCallback`\n",
    "run_config = RunConfig(\n",
    "    name=\"ptl-sent-classification\",\n",
    "    checkpoint_config=CheckpointConfig(\n",
    "        num_to_keep=2,\n",
    "        checkpoint_score_attribute=\"matthews_correlation\",\n",
    "        checkpoint_score_order=\"max\",\n",
    "    ),\n",
    ")\n",
    "\n",
    "# Schedule four workers for DDP training (1 GPU/worker by default)\n",
    "scaling_config = ScalingConfig(num_workers=4, use_gpu=True)\n",
    "\n",
    "trainer = TorchTrainer(\n",
    "    train_loop_per_worker=train_func,\n",
    "    train_loop_config=train_func_config,\n",
    "    scaling_config=scaling_config,\n",
    "    run_config=run_config,\n",
    "    datasets={\"train\": train_dataset, \"validation\": validation_dataset}, # <- Feed the Ray Datasets here\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"tuneStatus\">\n",
       "  <div style=\"display: flex;flex-direction: row\">\n",
       "    <div style=\"display: flex;flex-direction: column;\">\n",
       "      <h3>Tune Status</h3>\n",
       "      <table>\n",
       "<tbody>\n",
       "<tr><td>Current time:</td><td>2023-08-14 16:51:48</td></tr>\n",
       "<tr><td>Running for: </td><td>00:05:50.88        </td></tr>\n",
       "<tr><td>Memory:      </td><td>34.5/186.6 GiB     </td></tr>\n",
       "</tbody>\n",
       "</table>\n",
       "    </div>\n",
       "    <div class=\"vDivider\"></div>\n",
       "    <div class=\"systemInfo\">\n",
       "      <h3>System Info</h3>\n",
       "      Using FIFO scheduling algorithm.<br>Logical resource usage: 1.0/48 CPUs, 4.0/4 GPUs\n",
       "    </div>\n",
       "    \n",
       "  </div>\n",
       "  <div class=\"hDivider\"></div>\n",
       "  <div class=\"trialStatus\">\n",
       "    <h3>Trial Status</h3>\n",
       "    <table>\n",
       "<thead>\n",
       "<tr><th>Trial name              </th><th>status    </th><th>loc               </th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  train_loss</th><th style=\"text-align: right;\">  matthews_correlation</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td>TorchTrainer_b723f_00000</td><td>TERMINATED</td><td>10.0.63.245:150507</td><td style=\"text-align: right;\">     5</td><td style=\"text-align: right;\">         337.748</td><td style=\"text-align: right;\">   0.0199119</td><td style=\"text-align: right;\">              0.577705</td><td style=\"text-align: right;\">      4</td></tr>\n",
       "</tbody>\n",
       "</table>\n",
       "  </div>\n",
       "</div>\n",
       "<style>\n",
       ".tuneStatus {\n",
       "  color: var(--jp-ui-font-color1);\n",
       "}\n",
       ".tuneStatus .systemInfo {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       ".tuneStatus td {\n",
       "  white-space: nowrap;\n",
       "}\n",
       ".tuneStatus .trialStatus {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       ".tuneStatus h3 {\n",
       "  font-weight: bold;\n",
       "}\n",
       ".tuneStatus .hDivider {\n",
       "  border-bottom-width: var(--jp-border-width);\n",
       "  border-bottom-color: var(--jp-border-color0);\n",
       "  border-bottom-style: solid;\n",
       "}\n",
       ".tuneStatus .vDivider {\n",
       "  border-left-width: var(--jp-border-width);\n",
       "  border-left-color: var(--jp-border-color0);\n",
       "  border-left-style: solid;\n",
       "  margin: 0.5em 1em 0.5em 1em;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m 2023-08-14 16:46:02.166995: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA\n",
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m 2023-08-14 16:46:02.306203: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m 2023-08-14 16:46:03.087593: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m 2023-08-14 16:46:03.087670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "\u001b[2m\u001b[36m(TrainTrainable pid=150507)\u001b[0m 2023-08-14 16:46:03.087677: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n",
      "\u001b[2m\u001b[36m(TorchTrainer pid=150507)\u001b[0m Starting distributed worker processes: ['150618 (10.0.63.245)', '150619 (10.0.63.245)', '150620 (10.0.63.245)', '150621 (10.0.63.245)']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Setting up process group for: env:// [rank=0, world_size=4]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Auto configuring locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m 2023-08-14 16:46:10.311338: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 2023-08-14 16:46:10.408092: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m 2023-08-14 16:46:11.238415: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m 2023-08-14 16:46:11.238492: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m 2023-08-14 16:46:11.238500: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150619)\u001b[0m Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m GPU available: True, used: True\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m TPU available: False, using: 0 TPU cores\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m IPU available: False, using: 0 IPUs\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m HPU available: False, using: 0 HPUs\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Missing logger folder: /home/ray/ray_results/ptl-sent-classification/TorchTrainer_b723f_00000_0_2023-08-14_16-45-57/rank_3/lightning_logs\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m 2023-08-14 16:46:10.337167: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA\u001b[32m [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m 2023-08-14 16:46:10.467812: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m 2023-08-14 16:46:11.270123: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\u001b[32m [repeated 6x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m 2023-08-14 16:46:11.270131: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m \n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m   | Name  | Type                          | Params\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m --------------------------------------------------------\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 0 | model | BertForSequenceClassification | 108 M \n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m --------------------------------------------------------\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 108 M     Trainable params\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 0         Non-trainable params\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 108 M     Total params\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m 433.247   Total estimated model params size (MB)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9855485835db46d5be3df9ad8aeef168",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "319d63c5ab5b4fdcb83f40b6250e2aa8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d33b2a67421a4472b09e391cb19eac01",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9fd5becbccf448139cd5906118711ce9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)] -> OutputSplitter[split(4, equal=True)]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b'], preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150619)\u001b[0m Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Missing logger folder: /home/ray/ray_results/ptl-sent-classification/TorchTrainer_b723f_00000_0_2023-08-14_16-45-57/rank_2/lightning_logs\u001b[32m [repeated 3x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4929168d51234b51b9e0ab72c30d6ae3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150822) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m [W reducer.cpp:1300] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e3a7a20a10d84f829496fb716a17b4d6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\u001b[32m [repeated 4x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\u001b[32m [repeated 4x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\u001b[32m [repeated 5x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m [W reducer.cpp:1300] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())\u001b[32m [repeated 3x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9dc2da867eae4cd08a5d1efbd596ca0b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ef8287661a6e4af58e9a68a8573c2571",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a6c8f59f08a64dfbb7d86f05063ee507",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)] -> OutputSplitter[split(4, equal=True)]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b'], preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\u001b[32m [repeated 3x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\u001b[32m [repeated 4x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b44b3149e60c458a868c1bdefc2d7f1b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150822) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bb1dc7ec559a47828cd07e980bd90b38",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2ed7416eada74116a4360666e91a4929",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9ebb42be9d5640488e36ed0ec018a568",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0eea5895443f4b06a0957f052f2b542f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[1m\u001b[36m(autoscaler +2m37s)\u001b[0m Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)] -> OutputSplitter[split(4, equal=True)]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b'], preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\u001b[32m [repeated 3x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "55002f995b1f4b13a06ab9d9ad26bde4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150822) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cc83a64b6a7c4d509267a5e491ec2cb3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c506b0e9ab6049cf935f434946ded564",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b1d1983ca77b4f2bb765695c6e3d2cff",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4dec92bf49e44c08bae10a177fa39d4f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)] -> OutputSplitter[split(4, equal=True)]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b'], preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\u001b[32m [repeated 3x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3fe17e7cfe8c46c59f19eeee3f035ac0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150822) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2b3c03c2d16243e9909f0993bcb3236a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bf189f6d33d349f78319bc0c8cbdfe74",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "913dbf89b0f049b88a79ded14306343a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b462dba6a9dd41caaf09cc2328d74123",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)] -> OutputSplitter[split(4, equal=True)]\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=['d4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b', 'd4dd34cdb4b35e8b1e0f1ab4187b66ed900ab78de951f03e1125233b'], preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150618)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\u001b[32m [repeated 2x across cluster]\u001b[0m\n",
      "\u001b[2m\u001b[36m(SplitCoordinator pid=150822)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\u001b[32m [repeated 3x across cluster]\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6a84f019f5394fbc91a19d85ea06eaec",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150822) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9b777e13fc9b4ad887a2954dd50ef2ce",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150620) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150620)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(tokenize_sentence)]\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=2000000000.0), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "\u001b[2m\u001b[36m(RayTrainWorker pid=150621)\u001b[0m Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "397281cffac14ce6a6a6e5ea1f22ccf1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150621) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dd2a2d8ee918493388f0b3c500087692",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150619) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d68438f4b645455097c1d75d5c6a7c50",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=150618) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-08-14 16:51:48,299\tINFO tune.py:1146 -- Total run time: 350.99 seconds (350.87 seconds for the tuning loop).\n"
     ]
    }
   ],
   "source": [
    "result = trainer.fit()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "Note that this examples uses Ray Data for data ingestion for faster preprocessing, but you can also continue to use the native `PyTorch DataLoader` or `LightningDataModule`. See {doc}`Train a Pytorch Lightning Image Classifier <lightning_mnist_example>`. \n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Result(\n",
       "  metrics={'train_loss': 0.019911885261535645, 'matthews_correlation': 0.577705364544777, 'epoch': 4, 'step': 670},\n",
       "  path='/home/ray/ray_results/ptl-sent-classification/TorchTrainer_b723f_00000_0_2023-08-14_16-45-57',\n",
       "  checkpoint=TorchCheckpoint(local_path=/home/ray/ray_results/ptl-sent-classification/TorchTrainer_b723f_00000_0_2023-08-14_16-45-57/checkpoint_000004)\n",
       ")"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2m\u001b[1m\u001b[36m(autoscaler +50m28s)\u001b[0m Cluster is terminating (reason: user action).\n"
     ]
    }
   ],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## See also\n",
    "\n",
    "* [Ray Train Examples](../../examples) for more use cases\n",
    "\n",
    "* [Ray Train User Guides](train-user-guides) for how-to guides"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "build",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  },
  "orphan": true,
  "vscode": {
   "interpreter": {
    "hash": "178108d354ddc93ba36c4b7bfc5283800982aac0e7ca92cc0cf312ad1b8f8b20"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
